-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/test-runner: Fix execute() flakiness #142747
Conversation
212594b
to
0af4309
Compare
nixos/lib/test-driver/test-driver.py
Outdated
# (for example when 4094 bytes of output were written, half | ||
# of the status code magic would be in one chunk and the other | ||
# half in the next one). | ||
chunk_to_check = prev_chunk + chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that chunks are sufficiently big; >= 0.5 * magic
chunk_to_check
: it's not really a chunk anymore when they've been combined. Maybe bytes_to_check
?
Maybe StreamReader.readuntil could make this code redundant?
Otherwise, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried StreamReader
but it's async stuff and I'm not going to implement that.
nixos/lib/test-driver/test-driver.py
Outdated
return (status_code, (output + output2).decode()) | ||
|
||
output += prev_chunk | ||
prev_chunk = chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could avoid the minimum chunk size assumption by taking something like chunk_to_check[-max_match_size:]
instead of chunk
. Also, prev_chunk
should probably be renamed to something like search_buffer
.
nixos/lib/test-driver/test-driver.py
Outdated
@@ -583,23 +583,44 @@ def require_unit_state(self, unit: str, require_state: str = "active") -> None: | |||
) | |||
|
|||
def execute(self, command: str) -> Tuple[int, str]: | |||
status_code_magic = "|!=EOF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the trick used in HTTP chunked encoding. If you use a sufficiently long random string, the probability of a false positive is effectively zero.
nixos/lib/test-driver/test-driver.py
Outdated
status_code = int(status_code_b.strip()) | ||
return (status_code, (output + output2).decode()) | ||
|
||
output += prev_chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could create a queue of chunks instead, joining them in one go before .decode()
. That avoids quadratic complexity.
I don't care enough about this problem to learn Python, sorry. Maybe someone else will come up with a better understanding of the language and a better solution (cc @K900). |
nixos/lib/test-driver/test-driver.py
Outdated
# exit code it has, so we execute the command, output a special magic | ||
# string afterwards and then the return code. | ||
out_command = ( | ||
f"( set -euo pipefail; {command} ); echo '{status_code_magic}' $?\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last time we did this sort of thing we did something like this:
(
_stdout=$(mktemp)
_stderr=$(mktemp)
trap 'rm $_stdout $_stderr' EXIT
(set -euo pipefail; {command}) >$_stdout 2>$_stderr
echo RESULT: $? $(stat -c $_stdout $_stderr)
cat $_stdout $_stderr
)
while that is significant extra wrapping it also obviates the need to check for magic substrings, replies can be handled as having one readable line of header and two known-length data blobs following—any output on stderr immediately signals a failure of the wrapper. though that only works well if commands don't echo gigabytes worth of data on stdout/stderr. :/
0af4309
to
71d9e16
Compare
Looks like I do care… I reimplemented everything to use base64. This way we get around this shifting window stuff by just matching |
decoded = chunk.decode() | ||
print(f"c={decoded}") | ||
output_buffer += [decoded] | ||
if decoded[-1] == "\n": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed to be the last character? What happens if we receive two lines in one recv
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this breaks. But I don't see how we would receive two lines when we base64-encode the output and tell base64 to not add \n after 76 characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the exit code is won't be printed before this method returns, so lgtm.
@GrahamcOfBorg eval |
@ofborg eval |
b55613b
to
cbde380
Compare
I had to add a flag to |
cbde380
to
e1c2abe
Compare
e1c2abe
to
fd8ecd6
Compare
nixos/lib/test-driver/test-driver.py
Outdated
@@ -582,24 +582,37 @@ def require_unit_state(self, unit: str, require_state: str = "active") -> None: | |||
+ "'{}' but it is in state ‘{}’".format(require_state, state) | |||
) | |||
|
|||
def execute(self, command: str) -> Tuple[int, str]: | |||
def _read_line_from_shell(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from following the notifications i see that there are still some changes happening, although i had no time this week to look deeper into all of them.
when you found the good/best working version, i suggest renaming this function. from a function with the current name i would expect that it reads output line by line so i could call it repeatedly (or use it as an iterable generator, which would also be very fancy if it yield
ed individual lines). but instead, the function blocks until it receives a long string with potentially multiple lines until the last received portion ends with a newline (which might not even always be the case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already done (at least that what I thought) but then I found it breaks the tests I now fix and then the documentation was wrong and so on… so it should now really be done.
What would you recommend as a name? It's actually correct imo because it reads a single line from the chardev that connects to the backdoor that is called shell
but I can see how this generic name is misleading. Maybe something like _read_line_from_shell_pipe
or something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While re-reading the text, carefully trying to describe the semantics in my own words, in order to use that for a good name, i got another question:
what does not decoded
really mean? does this boil down to an empty string? i did not find an explanation in what cases decode()
might return a null object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only found one case: this happens which is when the pipe breaks. You then get an empty string (or whatever the type is). The if not decoded
is there to prevent infinite loops when that happens. We can probably not rescue from that situation anyway (pipe is alredy broken) and returning with parts of the output is preferrable to hanging.
The case with the broken pipe is where check_return
comes into play. When you don't set it to False
, execute()
tries to write echo ${PIPESTATUS[0]}\n
into the shell socket which fails when the pipe is broken, thus making the broken pipe detectable as opposed to just hanging forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipe breaks should already be visible at the recv
call, shouldn't they? what do you think about putting the check there and then call the function something along the lines next_newline_closed_block
or something, because this is really what we do. (one might still take such unusual [but correct] names as a hint that the overall design of this function has not reached the ideal state yet although it works for our relevant cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tries to write
echo ${PIPESTATUS[0]}\n
into the shell socket which fails when the pipe is broken
This implies that the test runner (python) broke the pipe, which I can only see happening when the test fails or crashes, so breaking the pipe in this direction doesn't need to be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it happens. When the hibernate test for example shuts down the VM by means of systemctl hibernate
, the pipe breaks because the qemu process dies. That's exactly where I observed the broken pipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course. These aren't normal processes.
Instead of using the magic string, we now just base64-encode everything and check for a newline.
fd8ecd6
to
c2bdad7
Compare
@tfc any updates to this? I'm starting to feel pressure from the soon to be expected feature freeze |
Looks like this PR broke at least one nixos test: #144613 |
I was bisecting |
There's a workaround: #144679 |
nixos/tests.plasma5: Fix after #142747
Recently, the implementation behind Machine.execute() and thus also Machine.succeed() has been changed[1] to pipe all the command's output into base64 on the guest machine. Unfortunately this means that base64 is blocking until stdout is closed, which in turn means that we now need to make sure that whenever we run a program in background via "&" we also need to make sure to close stdout. In the PSI test, we're doing this by simply redirecting the output to stderr. [1]: NixOS/nixpkgs#142747 Signed-off-by: aszlig <[email protected]>
In NixOS#142747, the implementation behind Machine.execute() has been changed to pipe all the command's output into base64 on the guest machine. Unfortunately this means that base64 is blocking until stdout is closed, which in turn means that we now need to make sure that whenever we run a program in background via "&" we also need to make sure to close stdout, which we do by redirecting stdout to stderr. Signed-off-by: aszlig <[email protected]>
Motivation for this change
Things done
sandbox = true
set innix.conf
? (See Nix manual)switchTest
nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)